计算机与现代化 ›› 2013, Vol. 1 ›› Issue (9): 31-34.doi: 10.3969/j.issn.1006-2475.2013.09.007

• 人工智能 • 上一篇    下一篇

基于AdaBoost的不完整数据的信息熵分类算法

吕 靖1,舒礼莲2   

  1. 1.安徽大学计算机科学与技术学院,安徽 合肥 230601;2.江西省计算技术研究所,江西 南昌 330002
  • 收稿日期:2013-03-29 修回日期:1900-01-01 出版日期:2013-09-17 发布日期:2013-09-17

Incomplete Data Information Entropy Classification Algorithm Based on AdaBoost

LYU Jing1, SHU Li-lian2   

  1. 1. School of Computer Science and Technology, Anhui University, Hefei 230601, China;2. Jiangxi Institute of Computing Technology, Nanchang 330002, China
  • Received:2013-03-29 Revised:1900-01-01 Online:2013-09-17 Published:2013-09-17

摘要: 目前,针对不完整数据的集成分类算法没有考虑缺失属性之间的差异,在衡量各个子分类器的权值时仅仅考虑了数据集的大小以及包含属性的多少,并没有考虑各个数据子集之间属性的差异度。本文利用信息熵对各个子数据集的重要程度进行量化,进而评估从该数据集构建出的分类器的权值,使得在最终的加权投票过程更加公平,最终结果更加准确。使用基于multi-class AdaBoost的集成分类算法,以BP算法为基础分类器,对来自UCI的数据集进行实验,实验结果表明该算法在一定程度上提高了不完整数据的分类精度。

关键词: multi-class AdaBoost, 信息熵, 不完整数据, 集成分类

Abstract: At present, the ensemble classification algorithms for incomplete data do not consider the differences among attributes. They weight the sub-classifiers just using the size and the dimension of sub-dataset. In this paper, the information entropy is used to quantify the differences among various sub-datasets, and then the weights for each sub-classifier are computed. So, the weighted voting is fairer, and the prediction accuracy is higher. Experiments on UCI datasets with base classifier of BP show that the proposed algorithm is better than the algorithm using simple weight.

Key words: multi-class AdaBoost, information entropy, incomplete data, ensemble classification

中图分类号: